Data Analysis

Data Analysis

Writer
Affiliation
Javier Silva-Valencia

Instituut Voor Tropische Geneeskunde. Antwerp-Belgium

Published

2023-02-28

Abstract
Today we are going to create an index/score. For this we have a lot of variables of socioeconomical aspects and we are going to clean and select the ones we are going to use to create the index/score


Analysis

Installing packages

#install.packages("epiR")
library(epiR)
Warning: package 'epiR' was built under R version 4.2.2
Loading required package: survival
Package epiR 2.0.57 is loaded
Type help(epi.about) for summary information
Type browseVignettes(package = 'epiR') to learn how to use epiR for applied epidemiological analyses

Now we will try to make a multivariate analysis

final_dataset = readRDS("final_dataset.RDS")
final_dataset$case <- final_dataset$suffered_vl_since_2nd_survey

Brickwall is different to fench wall

Mushahar caste

table(final_dataset$household_head_subcaste)

                       AHIR        ANSARI        BADHAI         BANIA 
            4            18          7109             5             5 
       BANIYA         BARAI         BHATT      BHUMIHAR      BRAHAMAN 
         2178             7            55          8457          1513 
       CHAMAR    CHOURASIYA         DHOBI       DHUNIYA          DOME 
         5090           206          1273           570            54 
       DUSADH        DUSHAD         FAKIR        GARERI        HAJJAM 
          195            15           260           395          1968 
       IDRISI        JULAHA         KAHAR     KANAUJIYA          KANU 
           53           102           579            68           193 
     KAYASTHA          KHAN         KOIRI        KUMHAR       KURESHI 
          327           159          1085          1873            36 
        KURMI        LAHERI         LOHAR         MAHTO          MALI 
         4059             6          2423             5           132 
       MALLAH      MANSOORI       MANSURI    MIR SHIKAR     MIRSHIKAR 
         1734          1913             1           125             2 
     MUSHAHAR  MUSLIM DHOBI           NAI         NONIA        NONIYA 
         1980            55            33             7          5978 
         PASI        PASWAN          RAIN        RAJPUT SABAZI FAROSH 
          172          4748          2242          2009           425 
          SAH         SAIAD         SAYIN          SHAH        SHEIKH 
          121            85            43           143          9203 
        SHEKH         SONAR          SUDI         SUNNI        TAMOLI 
           32           599             4            11            29 
       TATAWA         TATMA         TATWA          TELI        TIWARI 
         1045            10             7          4027             2 
        YADAV 
         3952 

We’ll make an additional factors, ‘mushahar’ based on ‘household_head_subcaste’=“MUSHAHAR”

final_dataset$mushahar <- ifelse(final_dataset$household_head_subcaste == "MUSHAHAR", TRUE, FALSE)
table(final_dataset$household_head_subcaste, final_dataset$mushahar)
               
                FALSE TRUE
                    4    0
  AHIR             18    0
  ANSARI         7109    0
  BADHAI            5    0
  BANIA             5    0
  BANIYA         2178    0
  BARAI             7    0
  BHATT            55    0
  BHUMIHAR       8457    0
  BRAHAMAN       1513    0
  CHAMAR         5090    0
  CHOURASIYA      206    0
  DHOBI          1273    0
  DHUNIYA         570    0
  DOME             54    0
  DUSADH          195    0
  DUSHAD           15    0
  FAKIR           260    0
  GARERI          395    0
  HAJJAM         1968    0
  IDRISI           53    0
  JULAHA          102    0
  KAHAR           579    0
  KANAUJIYA        68    0
  KANU            193    0
  KAYASTHA        327    0
  KHAN            159    0
  KOIRI          1085    0
  KUMHAR         1873    0
  KURESHI          36    0
  KURMI          4059    0
  LAHERI            6    0
  LOHAR          2423    0
  MAHTO             5    0
  MALI            132    0
  MALLAH         1734    0
  MANSOORI       1913    0
  MANSURI           1    0
  MIR SHIKAR      125    0
  MIRSHIKAR         2    0
  MUSHAHAR          0 1980
  MUSLIM DHOBI     55    0
  NAI              33    0
  NONIA             7    0
  NONIYA         5978    0
  PASI            172    0
  PASWAN         4748    0
  RAIN           2242    0
  RAJPUT         2009    0
  SABAZI FAROSH   425    0
  SAH             121    0
  SAIAD            85    0
  SAYIN            43    0
  SHAH            143    0
  SHEIKH         9203    0
  SHEKH            32    0
  SONAR           599    0
  SUDI              4    0
  SUNNI            11    0
  TAMOLI           29    0
  TATAWA         1045    0
  TATMA            10    0
  TATWA             7    0
  TELI           4027    0
  TIWARI            2    0
  YADAV          3952    0

Number of people living in each household

First I create a variable ‘hh_size’ which is the number of rows per FSN:

final_dataset$hh_size <- ave(final_dataset$FSN, final_dataset$FSN, FUN = length)
#Calculate the

use summaries to see that you have a variable hh_size ranging from 1 to 35

summary(final_dataset$hh_size)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   5.000   7.000   7.638   9.000  35.000 

I also create a variable ‘large_hh’ based on ‘hh_size’

final_dataset$large_hh <- ifelse(final_dataset$hh_size > 7, TRUE, FALSE)

Now compute a new variable net 3 which is

No_Mosquito_Net/hh_size. Then recode it, lo:0.333 = FALSE, 0.333:hi = TRUE

final_dataset$net3 <- final_dataset$No_Mosquito_Net / final_dataset$hh_size

bednet per three person, otherwise FALSE

final_dataset$net3 <- final_dataset$net3 >= 0.333
table(final_dataset$net3)

FALSE  TRUE 
62676 18538 

Check with cc whether net3 and mushahar are associated with case (bivariate)

LR_Model3 <- glm(case ~ net3, family=binomial, data=final_dataset)
summary(LR_Model3)

Call:
glm(formula = case ~ net3, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0767  -0.0767  -0.0767  -0.0767   3.6584  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.82786    0.07383 -78.937  < 2e-16 ***
net3TRUE    -0.86298    0.22132  -3.899 9.65e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2867.0  on 81212  degrees of freedom
AIC: 2871

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model3))
(Intercept)    net3TRUE 
0.002944377 0.421901162 
exp(confint(LR_Model3))
Waiting for profiling to be done...
                  2.5 %      97.5 %
(Intercept) 0.002538656 0.003391284
net3TRUE    0.266266770 0.636608766
LR_Model4 <- glm(case ~ mushahar, family=binomial, data=final_dataset)
summary(LR_Model4)

Call:
glm(formula = case ~ mushahar, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1805  -0.0665  -0.0665  -0.0665   3.4972  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.11316    0.07568  -80.78   <2e-16 ***
mushaharTRUE  2.00434    0.19362   10.35   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2817.5  on 81212  degrees of freedom
AIC: 2821.5

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model4))
 (Intercept) mushaharTRUE 
 0.002213537  7.421202699 
round(exp(confint(LR_Model3)),2)
Waiting for profiling to be done...
            2.5 % 97.5 %
(Intercept)  0.00   0.00
net3TRUE     0.27   0.64

Now make a factor ‘riskwall’ with three levels: thatched wall, unplastered brick wall and plastered brick wall.

check association with case

2 =brick 1 = 3 =

final_dataset$risk_wall <- ifelse(
  final_dataset$Wall_Material %in% c(6,164,166), 2, ifelse(
    final_dataset$Wall_Material %in% 162:163, 1, ifelse(
      final_dataset$Wall_Material == 165, 3, NA)))

final_dataset$risk_wall <- as.factor(final_dataset$risk_wall)

Ownership of animals

final_dataset$own_buf <- ifelse(final_dataset$count_Buf > 0, TRUE, FALSE)
final_dataset$own_cow <- ifelse(final_dataset$count_Cow > 0, TRUE, FALSE)
final_dataset$own_Goa <- ifelse(final_dataset$count_Goa > 0, TRUE, FALSE)
final_dataset$own_Pou <- ifelse(final_dataset$count_Pou > 0, TRUE, FALSE)

Here I create a variable age group

final_dataset$agegrp <- cut(final_dataset$member_age, breaks=c(0,9,19,29,39,100))

and a variable ‘female’

final_dataset$female <- ifelse(final_dataset$member_sex ==3, TRUE, FALSE)

and a variable Damp_floor

table(final_dataset$Is_Floor_Damp)

    0     1    32 
 8499 72711     4 
final_dataset$Damp_floor <- ifelse(final_dataset$Is_Floor_Damp == 1, TRUE, FALSE)

Recoding to binary the varaibles Sprayed_20**

table(final_dataset$Sprayed_2009)

   86    87    88 
71286  9921     7 
table(final_dataset$Sprayed_2010)

   86    87    88 
66776 14409    29 
final_dataset$IRS_09 <- ifelse(final_dataset$Sprayed_2009 > 86, TRUE, FALSE)
final_dataset$IRS_10 <- ifelse(final_dataset$Sprayed_2010 > 86, TRUE, FALSE)

#Analysis

Inputs for my table on the study population:

table(final_dataset$agegrp)

   (0,9]   (9,19]  (19,29]  (29,39] (39,100] 
   19447    17803    12098     9925    17540 
table(final_dataset$large_hh)

FALSE  TRUE 
48783 32431 
table(final_dataset$asset_index)

    1     2     3     4     5 
15410 12934 15813 17312 19745 
table(final_dataset$Bamboo_Tree)

    0     1 
49659 31555 
table(final_dataset$Banana_Tree)

    0     1 
11465 69749 
table(final_dataset$case)

    0     1 
81007   207 
table(final_dataset$indoor_Buf)
< table of extent 0 >
table(final_dataset$indoor_Cow)
< table of extent 0 >
table(final_dataset$indoor_Goa)
< table of extent 0 >
table(final_dataset$Is_Floor_Damp)

    0     1    32 
 8499 72711     4 
table(final_dataset$mushahar)

FALSE  TRUE 
79234  1980 
table(final_dataset$Neem_Tree)

    0     1 
47378 33836 
table(final_dataset$net3)

FALSE  TRUE 
62676 18538 
table(final_dataset$own_bov)
< table of extent 0 >
table(final_dataset$Perm_Water_Body)

    0     1 
41962 39252 
table(final_dataset$Rice_Field)

    0     1 
16637 64577 
table(final_dataset$risk_wall)

    1     2     3 
26627 35412 19175 
table(final_dataset$Sprayed_2009)

   86    87    88 
71286  9921     7 
table(final_dataset$Sprayed_2010)

   86    87    88 
66776 14409    29 
table(final_dataset$female)

FALSE  TRUE 
42423 38791 
table(final_dataset$own_buf)

FALSE  TRUE 
37183 11017 
table(final_dataset$own_cow)

FALSE  TRUE 
26794 21406 
table(final_dataset$own_Goa)

FALSE  TRUE 
22504 25696 
table(final_dataset$own_Pou)

FALSE  TRUE 
44707  3493 
table(final_dataset$Damp_floor)

FALSE  TRUE 
 8503 72711 

calculate OR for exposure females

LR_Model5 <- glm(case ~ female, family=binomial, data=final_dataset)
summary(LR_Model5)

Call:
glm(formula = case ~ female, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0727  -0.0727  -0.0727  -0.0700   3.4676  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.93430    0.09461 -62.726   <2e-16 ***
femaleTRUE  -0.07531    0.13964  -0.539     0.59    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2885.6  on 81212  degrees of freedom
AIC: 2889.6

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model5))
(Intercept)  femaleTRUE 
0.002647066 0.927454928 

calculate OR for exposure large_hh

LR_Model5a <- glm(case ~ large_hh, family=binomial, data=final_dataset)
summary(LR_Model5a)

Call:
glm(formula = case ~ large_hh, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0724  -0.0724  -0.0708  -0.0708   3.4615  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.98861    0.09064 -66.073   <2e-16 ***
large_hhTRUE  0.04702    0.14145   0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2885.8  on 81212  degrees of freedom
AIC: 2889.8

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model5a))
 (Intercept) large_hhTRUE 
 0.002507141  1.048140576 

calculate OR for exposure age group

LR_Model6 <- glm(case ~ agegrp, family=binomial, data=final_dataset)
summary(LR_Model6)

Call:
glm(formula = case ~ agegrp, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0779  -0.0772  -0.0732  -0.0696   3.4810  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.9215     0.1388 -42.648   <2e-16 ***
agegrp(9,19]     0.1264     0.1946   0.650    0.516    
agegrp(19,29]    0.1072     0.2171   0.494    0.621    
agegrp(29,39]   -0.1008     0.2470  -0.408    0.683    
agegrp(39,100]  -0.1348     0.2091  -0.645    0.519    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2862.8  on 76812  degrees of freedom
Residual deviance: 2860.6  on 76808  degrees of freedom
  (4401 observations deleted due to missingness)
AIC: 2870.6

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model6))
   (Intercept)   agegrp(9,19]  agegrp(19,29]  agegrp(29,39] agegrp(39,100] 
   0.002681103    1.134765971    1.113190806    0.904104524    0.873890667 

calculate OR for exposure Mushahar caste

LR_Model7 <- glm(case ~ mushahar, family=binomial, data=final_dataset)
summary(LR_Model7)

Call:
glm(formula = case ~ mushahar, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1805  -0.0665  -0.0665  -0.0665   3.4972  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.11316    0.07568  -80.78   <2e-16 ***
mushaharTRUE  2.00434    0.19362   10.35   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2817.5  on 81212  degrees of freedom
AIC: 2821.5

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model7))
 (Intercept) mushaharTRUE 
 0.002213537  7.421202699 

calculate OR for exposure Risk_wall caste

LR_Model8 <- glm(case ~ factor(risk_wall), family=binomial, data=final_dataset)
summary(LR_Model8)

Call:
glm(formula = case ~ factor(risk_wall), family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0850  -0.0850  -0.0737  -0.0737   3.7824  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -5.6217     0.1022 -54.982  < 2e-16 ***
factor(risk_wall)2  -0.2860     0.1446  -1.978   0.0479 *  
factor(risk_wall)3  -1.5308     0.2777  -5.512 3.54e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2842.9  on 81211  degrees of freedom
AIC: 2848.9

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model8))
       (Intercept) factor(risk_wall)2 factor(risk_wall)3 
       0.003618409        0.751245894        0.216360649 

calculate OR for exposure damp floor

LR_Model9 <- glm(case ~ Damp_floor, family=binomial, data=final_dataset)
summary(LR_Model9)

Call:
glm(formula = case ~ Damp_floor, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0742  -0.0742  -0.0742  -0.0742   3.7689  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -7.1014     0.3780 -18.786  < 2e-16 ***
Damp_floorTRUE   1.2083     0.3846   3.142  0.00168 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2871.2  on 81212  degrees of freedom
AIC: 2875.2

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model9))
   (Intercept) Damp_floorTRUE 
  0.0008239173   3.3476688328 

calculate OR for exposure sprayed (IRS)

LR_Model10 <- glm(case ~ IRS_09, family=binomial, data=final_dataset)
summary(LR_Model10)

Call:
glm(formula = case ~ IRS_09, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0953  -0.0675  -0.0675  -0.0675   3.4891  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.08458    0.07866 -77.356  < 2e-16 ***
IRS_09TRUE   0.69267    0.16885   4.102 4.09e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2871.3  on 81212  degrees of freedom
AIC: 2875.3

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model10))
(Intercept)  IRS_09TRUE 
0.002277712 1.999055617 
LR_Model10 <- glm(case ~ IRS_10, family=binomial, data=final_dataset)
summary(LR_Model10)

Call:
glm(formula = case ~ IRS_10, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1041  -0.0622  -0.0622  -0.0622   3.5353  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.24735    0.08813 -70.888   <2e-16 ***
IRS_10TRUE   1.03186    0.14373   7.179    7e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2840.1  on 81212  degrees of freedom
AIC: 2844.1

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model10))
(Intercept)  IRS_10TRUE 
0.001935571 2.806280365 

calculate OR for exposure bednet

LR_Model11 <- glm(case ~ net3, family=binomial, data=final_dataset)
summary(LR_Model11)

Call:
glm(formula = case ~ net3, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0767  -0.0767  -0.0767  -0.0767   3.6584  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.82786    0.07383 -78.937  < 2e-16 ***
net3TRUE    -0.86298    0.22132  -3.899 9.65e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2867.0  on 81212  degrees of freedom
AIC: 2871

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model11))
(Intercept)    net3TRUE 
0.002944377 0.421901162 

calculate OR for exposure ownership of animals

LR_Model12 <- glm(case ~ own_Goa, family=binomial, data=final_dataset)
summary(LR_Model12)

Call:
glm(formula = case ~ own_Goa, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0819  -0.0819  -0.0819  -0.0667   3.4955  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.1072     0.1415 -43.149   <2e-16 ***
own_GoaTRUE   0.4108     0.1780   2.307    0.021 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1862.9  on 48198  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1866.9

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model12))
(Intercept) own_GoaTRUE 
0.002226775 1.508038778 
LR_Model13 <- glm(case ~ own_cow, family=binomial, data=final_dataset)
summary(LR_Model13)

Call:
glm(formula = case ~ own_cow, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0820  -0.0820  -0.0820  -0.0656   3.5051  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.6928     0.1056 -53.915   <2e-16 ***
own_cowTRUE  -0.4479     0.1815  -2.468   0.0136 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1862.0  on 48198  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1866

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model13))
(Intercept) own_cowTRUE 
0.003370282 0.638984603 
LR_Model14 <- glm(case ~ own_buf, family=binomial, data=final_dataset)
summary(LR_Model14)

Call:
glm(formula = case ~ own_buf, family = binomial, data = final_dataset)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.082  -0.073  -0.073  -0.073   3.443  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.9258     0.1006 -58.889   <2e-16 ***
own_bufTRUE   0.2329     0.1930   1.207    0.227    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1867.0  on 48198  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1871

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model14))
(Intercept) own_bufTRUE 
0.002669615 1.262265619 
LR_Model15 <- glm(case ~ own_Pou, family=binomial, data=final_dataset)
summary(LR_Model15)

Call:
glm(formula = case ~ own_Pou, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0864  -0.0742  -0.0742  -0.0742   3.4339  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.89295    0.09028 -65.271   <2e-16 ***
own_PouTRUE  0.30311    0.29216   1.037      0.3    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1867.4  on 48198  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1871.4

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model15))
(Intercept) own_PouTRUE 
0.002758837 1.354060351 

calculate OR for exposure trees, rice field, water bodies

LR_Model16 <- glm(case ~ Bamboo_Tree, family=binomial, data=final_dataset)
summary(LR_Model16)

Call:
glm(formula = case ~ Bamboo_Tree, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0739  -0.0739  -0.0699  -0.0699   3.4690  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.0147     0.0910 -66.093   <2e-16 ***
Bamboo_Tree   0.1123     0.1412   0.795    0.426    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2885.3  on 81212  degrees of freedom
AIC: 2889.3

Number of Fisher Scoring iterations: 8
exp(coef(LR_Model16))
(Intercept) Bamboo_Tree 
0.002442569 1.118841553 
LR_Model17 <- glm(case ~ Banana_Tree, family=binomial, data=final_dataset)
summary(LR_Model17)

Call:
glm(formula = case ~ Banana_Tree, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0867  -0.0686  -0.0686  -0.0686   3.4793  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.5821     0.1528 -36.536  < 2e-16 ***
Banana_Tree  -0.4683     0.1716  -2.729  0.00636 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2879.2  on 81212  degrees of freedom
AIC: 2883.2

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model17))
(Intercept) Banana_Tree 
0.003764665 0.626039761 
LR_Model18 <- glm(case ~ Neem_Tree, family=binomial, data=final_dataset)
summary(LR_Model18)

Call:
glm(formula = case ~ Neem_Tree, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0750  -0.0750  -0.0750  -0.0662   3.5001  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.87275    0.08683 -67.633   <2e-16 ***
Neem_Tree   -0.25027    0.14520  -1.724   0.0848 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2882.9  on 81212  degrees of freedom
AIC: 2886.9

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model18))
(Intercept)   Neem_Tree 
0.002815113 0.778588109 
LR_Model19 <- glm(case ~ Perm_Water_Body, family=binomial, data=final_dataset)
summary(LR_Model19)

Call:
glm(formula = case ~ Perm_Water_Body, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0776  -0.0776  -0.0776  -0.0643   3.5166  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.80523    0.08922 -65.066  < 2e-16 ***
Perm_Water_Body -0.37601    0.14259  -2.637  0.00836 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2878.8  on 81212  degrees of freedom
AIC: 2882.8

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model19))
    (Intercept) Perm_Water_Body 
     0.00301176      0.68659395 
LR_Model20 <- glm(case ~ Rice_Field, family=binomial, data=final_dataset)
summary(LR_Model20)

Call:
glm(formula = case ~ Rice_Field, family = binomial, data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.0732  -0.0732  -0.0732  -0.0732   3.5194  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.1910     0.1717 -36.062   <2e-16 ***
Rice_Field    0.2713     0.1878   1.445    0.149    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2883.7  on 81212  degrees of freedom
AIC: 2887.7

Number of Fisher Scoring iterations: 9
exp(coef(LR_Model20))
(Intercept)  Rice_Field 
0.002047823 1.311719312 

Now I fit a logistic regression model with all factors that were signifcant at p < 0.10. I start by fitting univariate logistic regression

models for all factors, followed by an LR test

GLM.a <- glm(case ~ agegrp, family=binomial(logit), data=final_dataset)
GLM.b <- glm(case ~ Bamboo_Tree, family=binomial(logit), data=final_dataset)
GLM.c <- glm(case ~ Banana_Tree, family=binomial(logit), data=final_dataset)
GLM.d <- glm(case ~ Neem_Tree, family=binomial(logit), data=final_dataset)
GLM.e <- glm(case ~ Perm_Water_Body, family=binomial(logit), data=final_dataset)
GLM.f <- glm(case ~ Rice_Field, family=binomial(logit), data=final_dataset)
GLM.g <- glm(case ~ IRS_09, family=binomial(logit), data=final_dataset)
GLM.h <- glm(case ~ IRS_10, family=binomial(logit), data=final_dataset)
GLM.i <- glm(case ~ mushahar, family=binomial(logit), data=final_dataset)
GLM.j <- glm(case ~ net3, family=binomial(logit), data=final_dataset)
GLM.k <- glm(case ~ risk_wall, family=binomial(logit), data=final_dataset)
GLM.l <- glm(case ~ own_buf, family=binomial(logit), data=final_dataset)
GLM.m <- glm(case ~ own_cow, family=binomial(logit), data=final_dataset)
GLM.n <- glm(case ~ own_Goa, family=binomial(logit), data=final_dataset)
GLM.o <- glm(case ~ own_Pou, family=binomial(logit), data=final_dataset)
GLM.p <- glm(case ~ agegrp, family=binomial(logit), data=final_dataset)
GLM.q <- glm(case ~ female, family=binomial(logit), data=final_dataset)
GLM.r <- glm(case ~ Damp_floor, family=binomial(logit), data=final_dataset)
GLM.s <- glm(case ~ asset_index, family=binomial(logit), data=final_dataset)
anova(GLM.a, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                   76812     2862.8         
agegrp  4   2.2407     76808     2860.6   0.6916
anova(GLM.b, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

            Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                        81213     2885.9         
Bamboo_Tree  1  0.62798     81212     2885.3   0.4281
anova(GLM.c, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

            Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
NULL                        81213     2885.9            
Banana_Tree  1   6.7795     81212     2879.2 0.009221 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.d, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

          Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                      81213     2885.9           
Neem_Tree  1   3.0351     81212     2882.9  0.08148 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.e, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

                Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
NULL                            81213     2885.9            
Perm_Water_Body  1   7.1054     81212     2878.8 0.007685 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.f, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

           Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                       81213     2885.9         
Rice_Field  1   2.2157     81212     2883.7   0.1366
anova(GLM.g, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   81213     2885.9              
IRS_09  1   14.681     81212     2871.2 0.0001273 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.h, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   81213     2885.9              
IRS_10  1   45.825     81212     2840.1 1.293e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.i, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                     81213     2885.9              
mushahar  1   68.449     81212     2817.5 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.j, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                 81213     2885.9              
net3  1   18.931     81212     2867.0 1.355e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.k, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                      81213     2885.9              
risk_wall  2   43.079     81211     2842.8 4.421e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.l, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                    48199     1868.4         
own_buf  1   1.4042     48198     1867.0    0.236
anova(GLM.m, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                    48199     1868.4           
own_cow  1    6.344     48198     1862.0  0.01178 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.n, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
NULL                    48199     1868.4           
own_Goa  1   5.4818     48198     1862.9  0.01922 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.o, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                    48199     1868.4         
own_Pou  1  0.99283     48198     1867.4   0.3191
anova(GLM.p, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                   76812     2862.8         
agegrp  4   2.2407     76808     2860.6   0.6916
anova(GLM.q, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                   81213     2885.9         
female  1  0.29132     81212     2885.6   0.5894
anova(GLM.r, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

           Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                       81213     2885.9              
Damp_floor  1   14.684     81212     2871.2 0.0001271 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(GLM.s, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: case

Terms added sequentially (first to last)

            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                        81213     2885.9              
asset_index  4   55.333     81209     2830.6 2.766e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant at 0.10 are; IRS_09, IRS_10, asset index, banana tree, neem tree, water body, mushahar, net3, own_Goa, own_buf, risk_wall and Damp_Floor.

I the final model I will consider all except the IRS variables because here I suspect reversed causality

GLM.1 <- glm(case ~ asset_index + Banana_Tree + Neem_Tree + mushahar + Perm_Water_Body + net3 + own_Goa + own_buf + risk_wall + Damp_floor, 
             family=binomial(logit), data=final_dataset)
summary(GLM.1)

Call:
glm(formula = case ~ asset_index + Banana_Tree + Neem_Tree + 
    mushahar + Perm_Water_Body + net3 + own_Goa + own_buf + risk_wall + 
    Damp_floor, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2934  -0.0833  -0.0653  -0.0497   4.0039  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -6.1440     0.7317  -8.397  < 2e-16 ***
asset_index2      0.9281     0.2712   3.422 0.000623 ***
asset_index3      0.0783     0.3058   0.256 0.797901    
asset_index4      0.1684     0.3197   0.527 0.598314    
asset_index5     -0.1809     0.4125  -0.438 0.661091    
Banana_Tree      -0.1740     0.2291  -0.760 0.447361    
Neem_Tree        -0.0299     0.1836  -0.163 0.870645    
mushaharTRUE      1.6537     0.3112   5.314 1.07e-07 ***
Perm_Water_Body  -0.2044     0.1777  -1.150 0.250063    
net3TRUE         -0.1530     0.2795  -0.547 0.584210    
own_GoaTRUE       0.2875     0.1890   1.521 0.128223    
own_bufTRUE       0.2834     0.2038   1.391 0.164337    
risk_wall2       -0.5215     0.1929  -2.704 0.006852 ** 
risk_wall3       -1.1292     0.4380  -2.578 0.009933 ** 
Damp_floorTRUE    0.3249     0.6498   0.500 0.617109    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1780.2  on 48185  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1810.2

Number of Fisher Scoring iterations: 10
exp(coef(GLM.1))  # Exponentiated coefficients ("odds ratios")
    (Intercept)    asset_index2    asset_index3    asset_index4    asset_index5 
    0.002146395     2.529682254     1.081449092     1.183449238     0.834547488 
    Banana_Tree       Neem_Tree    mushaharTRUE Perm_Water_Body        net3TRUE 
    0.840263155     0.970540858     5.226359245     0.815175073     0.858155462 
    own_GoaTRUE     own_bufTRUE      risk_wall2      risk_wall3  Damp_floorTRUE 
    1.333083419     1.327582613     0.593597060     0.323281498     1.383833170 

Weakest is neem tree with a p-value of 0.949, so I drop it

GLM.2 <- glm(case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + net3 + own_Goa + own_buf + risk_wall + Damp_floor, 
             family=binomial(logit), data=final_dataset)
summary(GLM.2)

Call:
glm(formula = case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall + Damp_floor, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2926  -0.0842  -0.0649  -0.0499   4.0004  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -6.15294    0.72969  -8.432  < 2e-16 ***
asset_index2     0.92662    0.27114   3.418 0.000632 ***
asset_index3     0.07664    0.30568   0.251 0.802023    
asset_index4     0.16621    0.31947   0.520 0.602870    
asset_index5    -0.18429    0.41212  -0.447 0.654746    
Banana_Tree     -0.17534    0.22893  -0.766 0.443724    
mushaharTRUE     1.65995    0.30894   5.373 7.74e-08 ***
Perm_Water_Body -0.20595    0.17741  -1.161 0.245694    
net3TRUE        -0.15366    0.27952  -0.550 0.582517    
own_GoaTRUE      0.28719    0.18901   1.519 0.128650    
own_bufTRUE      0.28588    0.20319   1.407 0.159434    
risk_wall2      -0.52130    0.19287  -2.703 0.006874 ** 
risk_wall3      -1.12898    0.43805  -2.577 0.009958 ** 
Damp_floorTRUE   0.32526    0.64983   0.501 0.616700    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1780.2  on 48186  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1808.2

Number of Fisher Scoring iterations: 10
exp(coef(GLM.2))  # Exponentiated coefficients ("odds ratios")
    (Intercept)    asset_index2    asset_index3    asset_index4    asset_index5 
    0.002127209     2.525966224     1.079656728     1.180824956     0.831692253 
    Banana_Tree    mushaharTRUE Perm_Water_Body        net3TRUE     own_GoaTRUE 
    0.839172733     5.259049206     0.813875981     0.857566816     1.332676365 
    own_bufTRUE      risk_wall2      risk_wall3  Damp_floorTRUE 
    1.330938029     0.593750888     0.323362129     1.384390120 
anova(GLM.1, GLM.2, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + Banana_Tree + Neem_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall + Damp_floor
Model 2: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall + Damp_floor
  Resid. Df Resid. Dev Df  Deviance Pr(>Chi)
1     48185     1780.2                      
2     48186     1780.2 -1 -0.026584   0.8705

weakest is now damp floor with a p-value of 0.944, so I drop it

GLM.3 <- glm(case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + net3 + own_Goa + own_buf + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.3)

Call:
glm(formula = case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2927  -0.0843  -0.0649  -0.0488   3.9467  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.83307    0.34973 -16.679  < 2e-16 ***
asset_index2     0.92713    0.27117   3.419 0.000629 ***
asset_index3     0.07777    0.30570   0.254 0.799178    
asset_index4     0.16721    0.31946   0.523 0.600686    
asset_index5    -0.20574    0.41189  -0.499 0.617434    
Banana_Tree     -0.17128    0.22882  -0.749 0.454138    
mushaharTRUE     1.65872    0.30898   5.368 7.95e-08 ***
Perm_Water_Body -0.20662    0.17736  -1.165 0.244041    
net3TRUE        -0.15580    0.27970  -0.557 0.577520    
own_GoaTRUE      0.28757    0.18904   1.521 0.128206    
own_bufTRUE      0.28963    0.20308   1.426 0.153811    
risk_wall2      -0.52212    0.19284  -2.708 0.006779 ** 
risk_wall3      -1.21517    0.41253  -2.946 0.003223 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1780.5  on 48187  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1806.5

Number of Fisher Scoring iterations: 9
exp(coef(GLM.3))  # Exponentiated coefficients ("odds ratios")
    (Intercept)    asset_index2    asset_index3    asset_index4    asset_index5 
    0.002929059     2.527233069     1.080878012     1.182004240     0.814048351 
    Banana_Tree    mushaharTRUE Perm_Water_Body        net3TRUE     own_GoaTRUE 
    0.842583760     5.252598902     0.813331045     0.855733043     1.333184251 
    own_bufTRUE      risk_wall2      risk_wall3 
    1.335938017     0.593263747     0.296659129 
anova(GLM.2, GLM.3, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall + Damp_floor
Model 2: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     48186     1780.2                     
2     48187     1780.5 -1 -0.26782   0.6048

weakest is now net3, p = 0.247 so I drop it

GLM.4 <- glm(case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + own_Goa + own_buf + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.4)

Call:
glm(formula = case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    own_Goa + own_buf + risk_wall, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2929  -0.0837  -0.0652  -0.0495   3.9258  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.84985    0.34892 -16.766  < 2e-16 ***
asset_index2     0.92647    0.27122   3.416 0.000636 ***
asset_index3     0.07246    0.30560   0.237 0.812561    
asset_index4     0.14698    0.31768   0.463 0.643597    
asset_index5    -0.24545    0.40683  -0.603 0.546284    
Banana_Tree     -0.16924    0.22883  -0.740 0.459539    
mushaharTRUE     1.67108    0.30840   5.419 6.01e-08 ***
Perm_Water_Body -0.20758    0.17737  -1.170 0.241881    
own_GoaTRUE      0.29393    0.18885   1.556 0.119602    
own_bufTRUE      0.29379    0.20299   1.447 0.147815    
risk_wall2      -0.52443    0.19285  -2.719 0.006541 ** 
risk_wall3      -1.23333    0.41184  -2.995 0.002748 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1780.8  on 48188  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1804.8

Number of Fisher Scoring iterations: 9
exp(coef(GLM.4))  # Exponentiated coefficients ("odds ratios")
    (Intercept)    asset_index2    asset_index3    asset_index4    asset_index5 
    0.002880318     2.525581693     1.075155173     1.158334223     0.782348527 
    Banana_Tree    mushaharTRUE Perm_Water_Body     own_GoaTRUE     own_bufTRUE 
    0.844302851     5.317891377     0.812551966     1.341690715     1.341496852 
     risk_wall2      risk_wall3 
    0.591894775     0.291320778 
anova(GLM.3, GLM.4, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    net3 + own_Goa + own_buf + risk_wall
Model 2: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    own_Goa + own_buf + risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     48187     1780.5                     
2     48188     1780.8 -1 -0.32058   0.5713

weakest is now banana tree, p = 0.244107, so I drop it

GLM.5 <- glm(case ~ asset_index + mushahar + Perm_Water_Body + own_Goa + own_buf + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.5)

Call:
glm(formula = case ~ asset_index + mushahar + Perm_Water_Body + 
    own_Goa + own_buf + risk_wall, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3014  -0.0847  -0.0653  -0.0498   3.9226  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -5.98763    0.29714 -20.151  < 2e-16 ***
asset_index2     0.92714    0.27152   3.415 0.000639 ***
asset_index3     0.06817    0.30574   0.223 0.823569    
asset_index4     0.14386    0.31798   0.452 0.650976    
asset_index5    -0.25014    0.40673  -0.615 0.538547    
mushaharTRUE     1.69505    0.30715   5.519 3.42e-08 ***
Perm_Water_Body -0.21717    0.17690  -1.228 0.219587    
own_GoaTRUE      0.29642    0.18882   1.570 0.116457    
own_bufTRUE      0.29155    0.20293   1.437 0.150796    
risk_wall2      -0.52801    0.19281  -2.738 0.006173 ** 
risk_wall3      -1.23801    0.41153  -3.008 0.002627 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1781.3  on 48189  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1803.3

Number of Fisher Scoring iterations: 9
exp(coef(GLM.5))  # Exponentiated coefficients ("odds ratios")
    (Intercept)    asset_index2    asset_index3    asset_index4    asset_index5 
     0.00250961      2.52725830      1.07054456      1.15471734      0.77868991 
   mushaharTRUE Perm_Water_Body     own_GoaTRUE     own_bufTRUE      risk_wall2 
     5.44690955      0.80479068      1.34503316      1.33850440      0.58977815 
     risk_wall3 
     0.28996011 
anova(GLM.4, GLM.5, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    own_Goa + own_buf + risk_wall
Model 2: case ~ asset_index + mushahar + Perm_Water_Body + own_Goa + own_buf + 
    risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     48188     1780.8                     
2     48189     1781.3 -1 -0.52843   0.4673

weakest is now permanent water body with a p-value of 0.12, si I drop it

GLM.5 <- glm(case ~ asset_index + mushahar + own_Goa + own_buf + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.5)

Call:
glm(formula = case ~ asset_index + mushahar + own_Goa + own_buf + 
    risk_wall, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2928  -0.0813  -0.0647  -0.0527   3.8970  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.08251    0.28798 -21.121  < 2e-16 ***
asset_index2  0.92263    0.27136   3.400 0.000674 ***
asset_index3  0.06921    0.30564   0.226 0.820865    
asset_index4  0.13971    0.31813   0.439 0.660533    
asset_index5 -0.25522    0.40697  -0.627 0.530589    
mushaharTRUE  1.72838    0.30586   5.651  1.6e-08 ***
own_GoaTRUE   0.30346    0.18877   1.608 0.107938    
own_bufTRUE   0.28955    0.20273   1.428 0.153227    
risk_wall2   -0.52996    0.19290  -2.747 0.006008 ** 
risk_wall3   -1.25498    0.41132  -3.051 0.002280 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1782.9  on 48190  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1802.9

Number of Fisher Scoring iterations: 9
exp(coef(GLM.5))  # Exponentiated coefficients ("odds ratios")
 (Intercept) asset_index2 asset_index3 asset_index4 asset_index5 mushaharTRUE 
 0.002282434  2.515906508  1.071659119  1.149944761  0.774748456  5.631528602 
 own_GoaTRUE  own_bufTRUE   risk_wall2   risk_wall3 
 1.354533778  1.335826304  0.588629197  0.285080672 
anova(GLM.4, GLM.5, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + Banana_Tree + mushahar + Perm_Water_Body + 
    own_Goa + own_buf + risk_wall
Model 2: case ~ asset_index + mushahar + own_Goa + own_buf + risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     48188     1780.8                     
2     48190     1782.8 -2  -2.0541   0.3581

weakest is now own_buf with a p-value of 0.078, so I drop it

GLM.6 <- glm(case ~ asset_index + mushahar + own_Goa + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.6)

Call:
glm(formula = case ~ asset_index + mushahar + own_Goa + risk_wall, 
    family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3019  -0.0836  -0.0662  -0.0481   3.8724  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.98541    0.27950 -21.415  < 2e-16 ***
asset_index2  0.95132    0.27085   3.512 0.000444 ***
asset_index3  0.09875    0.30529   0.323 0.746354    
asset_index4  0.17073    0.31769   0.537 0.590986    
asset_index5 -0.23452    0.40725  -0.576 0.564699    
mushaharTRUE  1.73597    0.30660   5.662  1.5e-08 ***
own_GoaTRUE   0.23231    0.18129   1.281 0.200046    
risk_wall2   -0.53998    0.19254  -2.805 0.005039 ** 
risk_wall3   -1.27728    0.41135  -3.105 0.001902 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1868.4  on 48199  degrees of freedom
Residual deviance: 1784.8  on 48191  degrees of freedom
  (33014 observations deleted due to missingness)
AIC: 1802.8

Number of Fisher Scoring iterations: 9
exp(coef(GLM.6))  # Exponentiated coefficients ("odds ratios")
 (Intercept) asset_index2 asset_index3 asset_index4 asset_index5 mushaharTRUE 
 0.002515194  2.589122945  1.103787926  1.186168590  0.790946735  5.674401512 
 own_GoaTRUE   risk_wall2   risk_wall3 
 1.261511627  0.582761766  0.278793638 
anova(GLM.5, GLM.6, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + mushahar + own_Goa + own_buf + risk_wall
Model 2: case ~ asset_index + mushahar + own_Goa + risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     48190     1782.8                     
2     48191     1784.8 -1  -1.9646    0.161

weakest is now own_Goa, with a p-value of 0.059, so I drop it

GLM.7 <- glm(case ~ asset_index + mushahar + risk_wall, 
             family=binomial(logit), data=final_dataset)
summary(GLM.7)

Call:
glm(formula = case ~ asset_index + mushahar + risk_wall, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2709  -0.0772  -0.0656  -0.0484   3.8634  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.81466    0.16086 -36.148  < 2e-16 ***
asset_index2  0.55815    0.19458   2.869 0.004124 ** 
asset_index3 -0.08206    0.22363  -0.367 0.713656    
asset_index4  0.08176    0.22907   0.357 0.721141    
asset_index5 -0.52755    0.31406  -1.680 0.092998 .  
mushaharTRUE  1.96990    0.22180   8.881  < 2e-16 ***
risk_wall2   -0.40614    0.15684  -2.590 0.009609 ** 
risk_wall3   -1.11997    0.30776  -3.639 0.000274 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2758.5  on 81206  degrees of freedom
AIC: 2774.5

Number of Fisher Scoring iterations: 9
exp(coef(GLM.7))  # Exponentiated coefficients ("odds ratios")
 (Intercept) asset_index2 asset_index3 asset_index4 asset_index5 mushaharTRUE 
 0.002983507  1.747430447  0.921214685  1.085197543  0.590046584  7.169994094 
  risk_wall2   risk_wall3 
 0.666215469  0.326289121 
#anova(GLM.6, GLM.7, test="Chisq")

the LR test is non significant, p = 0.06 so I can indeed drop own_Goa

Now all factors are significant for at least one level. Asset index is weakest, I will just try without it

GLM.8 <- glm(case ~ mushahar + risk_wall, family=binomial(logit), data=final_dataset)
summary(GLM.8)

Call:
glm(formula = case ~ mushahar + risk_wall, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2219  -0.0825  -0.0648  -0.0648   3.7858  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -5.6816     0.1032 -55.069  < 2e-16 ***
mushaharTRUE   1.9894     0.2019   9.853  < 2e-16 ***
risk_wall2    -0.4843     0.1500  -3.229  0.00124 ** 
risk_wall3    -1.4839     0.2779  -5.339 9.33e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2777.4  on 81210  degrees of freedom
AIC: 2785.4

Number of Fisher Scoring iterations: 9
exp(coef(GLM.8))  # Exponentiated coefficients ("odds ratios")
 (Intercept) mushaharTRUE   risk_wall2   risk_wall3 
 0.003408013  7.310967636  0.616106559  0.226747130 

Clearly the model with asset index is significantly better than the model without, p= 0.0008.

anova(GLM.7, GLM.8, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + mushahar + risk_wall
Model 2: case ~ mushahar + risk_wall
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1     81206     2758.5                          
2     81210     2777.4 -4  -18.889 0.0008265 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next I fit an interaction between risk wall and asset index

#GLM.9 <- glm(case ~ mushahar + risk_wall + asset_index  +asset_index*risk_wall, family=binomial(logit), data=final_dataset)
#summary(GLM.9)
#exp(coef(GLM.9))  # Exponentiated coefficients ("odds ratios")

And I do the likelihood ratio testing

#anova(GLM.7, GLM.9, test="Chisq")

p-value of LR test is 0.72, so model with interaction is not significnatly better

Next I try interaction between Mushahar and asset index

GLM.10 <- glm(case ~ mushahar + risk_wall + asset_index + asset_index * mushahar, family=binomial(logit), data=final_dataset)
summary(GLM.10)

Call:
glm(formula = case ~ mushahar + risk_wall + asset_index + asset_index * 
    mushahar, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3379  -0.0776  -0.0655  -0.0474   3.8754  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)    
(Intercept)                -5.80280    0.17293 -33.556  < 2e-16 ***
mushaharTRUE                1.90493    0.29048   6.558 5.46e-11 ***
risk_wall2                 -0.38118    0.15647  -2.436 0.014847 *  
risk_wall3                 -1.09787    0.30715  -3.574 0.000351 ***
asset_index2                0.49491    0.21872   2.263 0.023651 *  
asset_index3               -0.02642    0.23691  -0.112 0.911219    
asset_index4                0.04108    0.24367   0.169 0.866110    
asset_index5               -0.60824    0.32754  -1.857 0.063309 .  
mushaharTRUE:asset_index2   0.37575    0.46189   0.814 0.415930    
mushaharTRUE:asset_index3 -13.26060  274.31195  -0.048 0.961444    
mushaharTRUE:asset_index4   0.32648    0.78551   0.416 0.677683    
mushaharTRUE:asset_index5   1.67151    1.09699   1.524 0.127578    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2750.4  on 81202  degrees of freedom
AIC: 2774.4

Number of Fisher Scoring iterations: 16
exp(coef(GLM.10))  # Exponentiated coefficients ("odds ratios")
              (Intercept)              mushaharTRUE                risk_wall2 
             3.019089e-03              6.718935e+00              6.830553e-01 
               risk_wall3              asset_index2              asset_index3 
             3.335810e-01              1.640354e+00              9.739296e-01 
             asset_index4              asset_index5 mushaharTRUE:asset_index2 
             1.041938e+00              5.443084e-01              1.456078e+00 
mushaharTRUE:asset_index3 mushaharTRUE:asset_index4 mushaharTRUE:asset_index5 
             1.741781e-06              1.386079e+00              5.320179e+00 

LR test has a p-value of 0.09, so complex model not significantly better

anova(GLM.7, GLM.10, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ asset_index + mushahar + risk_wall
Model 2: case ~ mushahar + risk_wall + asset_index + asset_index * mushahar
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1     81206     2758.5                       
2     81202     2750.4  4   8.0778  0.08877 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally I test the interaction between Mushahar and asset index

GLM.11 <- glm(case ~ mushahar + risk_wall + asset_index + risk_wall* mushahar, family=binomial(logit), data=final_dataset)
summary(GLM.11)

Call:
glm(formula = case ~ mushahar + risk_wall + asset_index + risk_wall * 
    mushahar, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2956  -0.0766  -0.0660  -0.0484   3.8607  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -5.82884    0.16381 -35.583  < 2e-16 ***
mushaharTRUE              2.15878    0.40886   5.280 1.29e-07 ***
risk_wall2               -0.38007    0.16606  -2.289 0.022098 *  
risk_wall3               -1.08461    0.30997  -3.499 0.000467 ***
asset_index2              0.56158    0.19505   2.879 0.003988 ** 
asset_index3             -0.07706    0.22438  -0.343 0.731262    
asset_index4              0.08036    0.22972   0.350 0.726472    
asset_index5             -0.53846    0.31488  -1.710 0.087256 .  
mushaharTRUE:risk_wall2  -0.23593    0.46385  -0.509 0.611009    
mushaharTRUE:risk_wall3 -10.68984  229.30575  -0.047 0.962817    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2757.8  on 81204  degrees of freedom
AIC: 2777.8

Number of Fisher Scoring iterations: 14
exp(coef(GLM.11))  # Exponentiated coefficients ("odds ratios")
            (Intercept)            mushaharTRUE              risk_wall2 
           2.941478e-03            8.660562e+00            6.838146e-01 
             risk_wall3            asset_index2            asset_index3 
           3.380347e-01            1.753440e+00            9.258332e-01 
           asset_index4            asset_index5 mushaharTRUE:risk_wall2 
           1.083679e+00            5.836490e-01            7.898355e-01 
mushaharTRUE:risk_wall3 
           2.277521e-05 

LR test has a p-vaule of 0.69, so no indication for an interaction

anova(GLM.11, GLM.7, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ mushahar + risk_wall + asset_index + risk_wall * mushahar
Model 2: case ~ asset_index + mushahar + risk_wall
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     81204     2757.8                     
2     81206     2758.5 -2 -0.75474   0.6857

So my final model remains model 7 with asset index, mushahar and risk wall

GLM.7 <- glm(case ~ mushahar + risk_wall + asset_index, family=binomial(logit), data=final_dataset)
summary(GLM.7)

Call:
glm(formula = case ~ mushahar + risk_wall + asset_index, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2709  -0.0772  -0.0656  -0.0484   3.8634  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.81466    0.16086 -36.148  < 2e-16 ***
mushaharTRUE  1.96990    0.22180   8.881  < 2e-16 ***
risk_wall2   -0.40614    0.15684  -2.590 0.009609 ** 
risk_wall3   -1.11997    0.30776  -3.639 0.000274 ***
asset_index2  0.55815    0.19458   2.869 0.004124 ** 
asset_index3 -0.08206    0.22363  -0.367 0.713656    
asset_index4  0.08176    0.22907   0.357 0.721141    
asset_index5 -0.52755    0.31406  -1.680 0.092998 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2758.5  on 81206  degrees of freedom
AIC: 2774.5

Number of Fisher Scoring iterations: 9
exp(coef(GLM.7))  # Exponentiated coefficients ("odds ratios")
 (Intercept) mushaharTRUE   risk_wall2   risk_wall3 asset_index2 asset_index3 
 0.002983507  7.169994094  0.666215469  0.326289121  1.747430447  0.921214685 
asset_index4 asset_index5 
 1.085197543  0.590046584 
exp(confint(GLM.7))
Waiting for profiling to be done...
                   2.5 %       97.5 %
(Intercept)  0.002150039  0.004041636
mushaharTRUE 4.582946449 10.957987112
risk_wall2   0.489354472  0.905649249
risk_wall3   0.172344535  0.579799021
asset_index2 1.193911726  2.564485167
asset_index3 0.590438223  1.422810718
asset_index4 0.689253185  1.696017962
asset_index5 0.311077759  1.071790719

Final model: GLM.7

You could consider recoding asset index to a binary varibale poverty, 1:4 = TRUE, 5 = FALSE

The interaction terms remain non significant.

If you really want to check for interaction is better to have two categories

Creation variable poverty

final_dataset$poverty <- ifelse(final_dataset$asset_index %in% 1:4, TRUE, FALSE)
GLM.14 <- glm(case ~ mushahar + risk_wall + poverty, family=binomial(logit), data=final_dataset)
summary(GLM.14)

Call:
glm(formula = case ~ mushahar + risk_wall + poverty, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2155  -0.0835  -0.0682  -0.0488   3.8695  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -6.3256     0.2879 -21.973  < 2e-16 ***
mushaharTRUE   1.9064     0.2042   9.336  < 2e-16 ***
risk_wall2    -0.4064     0.1524  -2.666 0.007675 ** 
risk_wall3    -1.1603     0.3002  -3.865 0.000111 ***
povertyTRUE    0.6683     0.2760   2.422 0.015456 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2770.7  on 81209  degrees of freedom
AIC: 2780.7

Number of Fisher Scoring iterations: 9
exp(coef(GLM.14))  # Exponentiated coefficients ("odds ratios")
 (Intercept) mushaharTRUE   risk_wall2   risk_wall3  povertyTRUE 
 0.001789926  6.729074823  0.666021666  0.313406479  1.950935905 
exp(confint(GLM.14))
Waiting for profiling to be done...
                    2.5 %      97.5 %
(Intercept)  0.0009857236 0.003061897
mushaharTRUE 4.4406396469 9.913043491
risk_wall2   0.4934245075 0.897685663
risk_wall3   0.1676370338 0.547489293
povertyTRUE  1.1674794260 3.464443760
GLM.15 <- glm(case ~ mushahar + risk_wall + poverty+ poverty*mushahar, family=binomial(logit), data=final_dataset)
summary(GLM.15)

Call:
glm(formula = case ~ mushahar + risk_wall + poverty + poverty * 
    mushahar, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3402  -0.0835  -0.0686  -0.0477   3.8813  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -6.3864     0.2948 -21.665  < 2e-16 ***
mushaharTRUE               3.5661     1.0545   3.382 0.000720 ***
risk_wall2                -0.3932     0.1524  -2.580 0.009873 ** 
risk_wall3                -1.1452     0.2995  -3.824 0.000131 ***
povertyTRUE                0.7282     0.2830   2.573 0.010090 *  
mushaharTRUE:povertyTRUE  -1.6952     1.0767  -1.574 0.115399    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2769.0  on 81208  degrees of freedom
AIC: 2781

Number of Fisher Scoring iterations: 9
exp(coef(GLM.15))  # Exponentiated coefficients ("odds ratios")
             (Intercept)             mushaharTRUE               risk_wall2 
             0.001684272             35.378557433              0.674874191 
              risk_wall3              povertyTRUE mushaharTRUE:povertyTRUE 
             0.318151537              2.071327921              0.183570820 
anova(GLM.15, GLM.14, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ mushahar + risk_wall + poverty + poverty * mushahar
Model 2: case ~ mushahar + risk_wall + poverty
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     81208     2769.0                     
2     81209     2770.7 -1  -1.6552   0.1983
exp(confint(GLM.14))
Waiting for profiling to be done...
                    2.5 %      97.5 %
(Intercept)  0.0009857236 0.003061897
mushaharTRUE 4.4406396469 9.913043491
risk_wall2   0.4934245075 0.897685663
risk_wall3   0.1676370338 0.547489293
povertyTRUE  1.1674794260 3.464443760
GLM.16 <- glm(case ~ mushahar + risk_wall + poverty + poverty * risk_wall, family=binomial(logit), data=final_dataset)
summary(GLM.16)

Call:
glm(formula = case ~ mushahar + risk_wall + poverty + poverty * 
    risk_wall, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2171  -0.0834  -0.0674  -0.0529   3.9382  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -6.2980     0.7081  -8.895   <2e-16 ***
mushaharTRUE             1.9232     0.2055   9.359   <2e-16 ***
risk_wall2              -0.2734     0.7755  -0.353    0.724    
risk_wall3              -1.4565     0.8375  -1.739    0.082 .  
povertyTRUE              0.6391     0.7155   0.893    0.372    
risk_wall2:povertyTRUE  -0.1535     0.7919  -0.194    0.846    
risk_wall3:povertyTRUE   0.4649     0.9012   0.516    0.606    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2769.7  on 81207  degrees of freedom
AIC: 2783.7

Number of Fisher Scoring iterations: 10
exp(coef(GLM.16))  # Exponentiated coefficients ("odds ratios")
           (Intercept)           mushaharTRUE             risk_wall2 
           0.001840039            6.843011327            0.760783379 
            risk_wall3            povertyTRUE risk_wall2:povertyTRUE 
           0.233049204            1.894688890            0.857698549 
risk_wall3:povertyTRUE 
           1.591790752 
anova(GLM.16, GLM.14, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ mushahar + risk_wall + poverty + poverty * risk_wall
Model 2: case ~ mushahar + risk_wall + poverty
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     81207     2769.7                     
2     81209     2770.7 -2 -0.93817   0.6256
GLM.17 <- glm(case ~ mushahar + risk_wall + poverty + mushahar * risk_wall, family=binomial(logit), data=final_dataset)
summary(GLM.17)

Call:
glm(formula = case ~ mushahar + risk_wall + poverty + mushahar * 
    risk_wall, family = binomial(logit), data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2329  -0.0830  -0.0686  -0.0488   3.8668  

Coefficients:
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -6.3480     0.2911 -21.804  < 2e-16 ***
mushaharTRUE              2.0743     0.3975   5.218 1.81e-07 ***
risk_wall2               -0.3836     0.1617  -2.373 0.017661 *  
risk_wall3               -1.1275     0.3022  -3.731 0.000191 ***
povertyTRUE               0.6798     0.2767   2.457 0.014011 *  
mushaharTRUE:risk_wall2  -0.2095     0.4623  -0.453 0.650490    
mushaharTRUE:risk_wall3 -10.6752   229.0476  -0.047 0.962827    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2770.0  on 81207  degrees of freedom
AIC: 2784

Number of Fisher Scoring iterations: 14
exp(coef(GLM.17))  # Exponentiated coefficients ("odds ratios")
            (Intercept)            mushaharTRUE              risk_wall2 
           1.750327e-03            7.958964e+00            6.813884e-01 
             risk_wall3             povertyTRUE mushaharTRUE:risk_wall2 
           3.238380e-01            1.973455e+00            8.110079e-01 
mushaharTRUE:risk_wall3 
           2.311089e-05 
anova(GLM.17, GLM.14, test="Chisq")
Analysis of Deviance Table

Model 1: case ~ mushahar + risk_wall + poverty + mushahar * risk_wall
Model 2: case ~ mushahar + risk_wall + poverty
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1     81207     2770.0                     
2     81209     2770.7 -2 -0.70753    0.702

So then my final model here is model 14: case = poverty + risk_wall + mushahar, but since poverty did not have any interactions we go back to model 7. Model 7 is the final model.

GLM.7 <- glm(case ~ mushahar + risk_wall + asset_index, family=binomial(logit), data=final_dataset)
summary(GLM.7)

Call:
glm(formula = case ~ mushahar + risk_wall + asset_index, family = binomial(logit), 
    data = final_dataset)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.2709  -0.0772  -0.0656  -0.0484   3.8634  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.81466    0.16086 -36.148  < 2e-16 ***
mushaharTRUE  1.96990    0.22180   8.881  < 2e-16 ***
risk_wall2   -0.40614    0.15684  -2.590 0.009609 ** 
risk_wall3   -1.11997    0.30776  -3.639 0.000274 ***
asset_index2  0.55815    0.19458   2.869 0.004124 ** 
asset_index3 -0.08206    0.22363  -0.367 0.713656    
asset_index4  0.08176    0.22907   0.357 0.721141    
asset_index5 -0.52755    0.31406  -1.680 0.092998 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2885.9  on 81213  degrees of freedom
Residual deviance: 2758.5  on 81206  degrees of freedom
AIC: 2774.5

Number of Fisher Scoring iterations: 9
exp(coef(GLM.7))  # Exponentiated coefficients ("odds ratios")
 (Intercept) mushaharTRUE   risk_wall2   risk_wall3 asset_index2 asset_index3 
 0.002983507  7.169994094  0.666215469  0.326289121  1.747430447  0.921214685 
asset_index4 asset_index5 
 1.085197543  0.590046584 
exp(confint(GLM.7))
Waiting for profiling to be done...
                   2.5 %       97.5 %
(Intercept)  0.002150039  0.004041636
mushaharTRUE 4.582946449 10.957987112
risk_wall2   0.489354472  0.905649249
risk_wall3   0.172344535  0.579799021
asset_index2 1.193911726  2.564485167
asset_index3 0.590438223  1.422810718
asset_index4 0.689253185  1.696017962
asset_index5 0.311077759  1.071790719

odds ratios: mushahar 7.2 (4.4-9.9), brick wall vs thatched wall 0.7 (0.5-0.9), plastered brick wall vs. thatched wall 0.3 (0.2-0.5)

and assetindex 1.7(1.2-2.6), 0.9(0.6-1.4), 1.1(0.7-1.7), 0.6(0.3-1.1)

At the end I export my final dataset to csv so I can use it in other programs

write.table(final_dataset, “final_dataset.csv”, sep=“,”, col.names=TRUE, row.names=FALSE, quote=TRUE, na=“NA”)

#The poorest of the poor: bihar viscerlar leshmaniasis #Visceral Leshmaniasis in Bihar